Towards Developing a Multi-Dialect Morphological Analyser for Arabic
نویسندگان
چکیده
In this paper we address the problem of the analysis of multi-dialect Arabic morphology. Our method involves based on the synthesis of two methods. The first method is linguistic based, using an adopted Modern Standard Arabic (MSA) Morphology Analyser to first deal with dialect prefixes and suffixes and then analyse the words. This method improves accuracy of dialect words by 69%. The second method involves segmenting the word and then using ‘the web as corpus’ to estimate frequency of different segment combinations which are used to guess the correct base form. The overall synthesis is shown to have 94% accuracy on a corpus of Arabic dialects. Keywords—Morphology Analyser; Multi-Dialect; Web Corpus
منابع مشابه
Borrowing the Verb “ast” and Its Varieties in Arabic Dialect of Sarab
“Borrowing” is a lingual process that is studied in diachronic linguistics. In this process a language borrows elements from another language. This process usually occurs in areas that two languages make contact with each other. In a dialect spoken in South Khorasan the language borrowing happens. Arabs living in this part of Iran probably have immigrated in the early centuries of Islam. In thi...
متن کاملReducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition
This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to finding and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects. Three resource...
متن کاملYAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-the-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but ...
متن کاملBuilding a Shallow Arabic Morphological Analyser in One Day
The paper presents a rapid method of developing a shallow Arabic morphological analyzer. The analyzer will only be concerned with generating the possible roots of any given Arabic word. The analyzer is based on automatically derived rules and statistics. For evaluation, the analyzer is compared to a commercially available Arabic Morphological Analyzer.
متن کاملGraphone Model Interpolation and Arabic Pronunciation Generation
This paper extends n-gram graphone model pronunciation generation to use a mixture of such models. This technique is useful when pronunciation data is for a specific variant (or set of variants) of a language, such as for a dialect, and only a small amount of pronunciation dictionary training data for that specific variant is available. The performance of the interpolated ngram graphone model i...
متن کامل